Here is a demonstration on how to train the encoder to learn the parameters $\theta$ of a partial differential equation (PDE) given observations of the solution $u_\theta$. Now, technically for this procedure we don't actually need $u_\theta$ to be the solution of a PDE, it just needs to be some family of functions parameterized by $\theta$. But we will consider the PDE setting, as it is the desired usage of this autoencoder.
To that end, let $\theta = (c, b_0, b_1)$ be (constant) parameters for the Poisson equation with Dirichlet boundary conditions on the unit interval. That is, $u_\theta$ is the solution to
$$ \begin{align*} -\Delta u &= c \quad\text{on}\quad [0,1] \\ u(0) &= b_0 \\ u(1) &= b_1 \end{align*} $$
Since $c$ is constant, this equation is readily solved for $u_\theta = u(x)$ as the family of parabolas
$$ u(x) = -\frac{c}{2} x^2 + \left( \frac{c}{2} + b_1 - b_0 \right) x + b_0 $$
Now we will walk through training the encoder in five main steps:
To train the encoder to learn $\theta$,
we simply create a Python script equations/poisson.py
which contains the implementations of functions that define the domain, compute replicates of $\theta$,
compute $u_\theta$ for a given replicate, and a list of parameter names (optional).
There is a template configuration file in the equations directory.
Let's briefly walk through the Poisson configuration file.
First we import numpy and create a list containing strings to name the parameters in $\theta$.
In this case the names are $c$, $b_0$, and $b_1$.
The strings may be formatted with matplotlib's TeX rendering syntax.
import numpy as np
# PARAMETER NAMES
theta_names = [r'$c$', r'$b_0$', r'$b_1$']
Next we write a quick function that computes a lattice over the domain, in this case the unit interval.
# DOMAIN
def domain():
"""
The unit interval as an array of length 100
"""
return np.linspace(0., 1., num=100)
Now we write a function that takes $\theta$ as the input,
and computes the solution function $u_\theta = u(x)$ over
the lattice defined above.
This should be written assuming the argument theta is
a one-dimensional numpy array that contains the parameter values.
# SOLUTION
def solution(theta):
"""
Computes the solution u where theta=[c, b0, b1]
"""
x = domain()
c = theta[0]
b0 = theta[1]
b1 = theta[2]
u = -c * x**2 / 2 + (c/2 + b1 - b0)*x + b0
return u
Now to simulate replicates of our parameters,
we draw from uniform distributions and stack the arrays
so that the output has shape (replicates, num_params).
# THETA
def theta(replicates):
"""
Simulates replicates of Theta
Output has shape (replicates, num_params)
"""
# create replicates of theta=[c, b0, b1]
c = np.random.uniform(-100,100, size=replicates)
b0 = np.random.uniform(-10,10, size=replicates)
b1 = np.random.uniform(-10,10, size=replicates)
theta = np.stack((c, b0, b1), axis=1)
return theta
With equations/poisson.py properly configured we need only run
python train_encoder.py --trainon poisson.
The file train_encoder.py has three main parts to it:
load data, define neural network hyperparameters, train model.
Instead of running this script, let's break it down here.
The modules loaded in train_encoder.py are in the Poisson package.
We load (and possibly simulate) data using the
Poisson.equation.Dataset class,
which simply needs the name of the configuration file, excluding the file suffix.
from Poisson.equation import Dataset
# LOAD DATA
dataset = Dataset('poisson')
dataset.load()
The data are now loaded into the Dataset instance dataset,
already split into a training, validation, and test set.
It loads the data from the pickled file data/poisson.pkl,
but if the file doesn't exist the data are simulated
according to the configuration file using a default of 2000 replicates.
The parameter names and functions defined in the configuration file
are loaded and stored in the Dataset class.
We can use this to explore the data before designing the network.
Observe, we'll plot $u_\theta = u(x)$ for two realizations of $\theta$,
with the parameter values in the titles.
dataset.plot_solution()
Indeed, we can confirm the left and right endpoints of the above curves are given by $b_0$ and $b_1$, respectively. Furthermore, the parameter $c$ measures the negative curvature of the solutions, as is evident from the Poisson equation.
Henceforth, we let $\Phi$ denote observed data that acts as inputs to the neural net. The ground truth parameter is $\theta_\Phi$ while the parameters that are varying with the weights are $\theta$. Thus, $u_\theta$ represents a reconstruction of $\Phi$.
Now that the data are loaded, we can design a neural network to train. We will construct a feed-forward network with one hidden layer containing 20 nodes. By default the data are rescaled to be in the interval $[-1, 1]$, so $\tanh$ activations are a natural choice. Since this is a regression task, the final layer will be activated with the identity function $f(x) = x$. To avoid overfitting we'll drop 10% of the connections between the hidden and final layer. The optimizer and loss will be Adam and mean square error, respectively. Let's train the network for 25 epochs using batch sizes of 25.
Conveniently, this is all recorded in a design dictionary below.
Most keys make intuitive sense, but the unit_activations key
is perhaps more cryptic. It is a list of tuples, each tuple represents
the number of units and activation for each layer (in this case 2 layers).
We'll leave the callbacks list empty for now, but if we were training longer
we might decide to use a learning rate schedule or Tensorboard callbacks.
out_units = dataset.train[1].shape[1]
## DEFINE MODEL PARAMETERS
design = {'unit_activations':[(20, 'linear'),
(out_units, 'linear')],
'dropout':[0.],
'optimizer':'adam',
'loss':'mse',
'callbacks':[],
'batch_size':25,
'epochs':25,
}
All that is left is to train the encoder.
We use the Encoder class from the
Poisson.encoder module.
For reproducible results we'll set a seed in Tensorflow
before training.
from Poisson.encoder import Encoder
from tensorflow.random import set_seed
set_seed(23) # for reproducibility
# TRAIN MODEL
model = Encoder(design)
model.train()
Great, training complete! Now let's see how the network performs
when predicting on a test set. We'll use the Encoder.plot_theta_fit()
method to predict on the test set, print the component-wise MSE,
and plot the predictions against the ground truth.
Since $c$ and $b$ were simulated on different scales
we should compare them on a common scale by
dividing by the max absolute value
of the predicted and true parameters.
model.plot_theta_fit()
Not too shabby. Now let's see how the encoder performs when we hand it noisy data. That is, we test it on $\Phi + \varepsilon$ where $\varepsilon\sim\operatorname{Normal}(0,\sigma)$.
model.plot_theta_fit(sigma=1.0)
Since the purpose is to attach $\theta$ to a decoder that outputs $u_{\theta}$, an approximation of $\Phi$, we should totally decode the predicted parameters to produce the analytic solutions they give rise to. As expected, we can see the true and predicted solutions are quite close.
model.plot_solution_fit()
Looks pretty good. Just as before, let's see the solution space fit when we hand the model $\Phi + \varepsilon$.
model.plot_solution_fit(sigma=1.0)
Here we measure uncertainty by taking a bootstrap approach.
We sample 60% of the training set with replacement,
fit a model to the sample,
then predict on a fixed test set.
This is done num_boots many times and the results
can be visualized using the model.plot_theta_boot()
and model.plot_solution_boot() methods.
The credible regions for $\theta$ are found from
the bootstrap sampling distributions.
Gaussian noise can be thrown down on top of either the
training or test set by specifying their standard deviations
train_sigma and test_sigma, respectively.
num_boots = 100
model.bootstrap(num_boots, train_sigma=0, test_sigma=1.0)
model.plot_theta_boot()
model.plot_solution_boot()
model2 = Encoder(design)
model2.bootstrap(num_boots, train_sigma=1.0, test_sigma=0)
# bootstrap predictions on Phi + normal(0, sigma)
model2.plot_theta_boot()
# bootstrap predictions on Phi + normal(0, sigma)
model2.plot_solution_boot()
model3 = Encoder(design)
model3.bootstrap(num_boots, train_sigma=1.0, test_sigma=1.0)
model3.plot_theta_boot()
model3.plot_solution_boot()
model2.plot_solution_boot()
model.plot_solution_boot()
model3.plot_solution_boot()